
The ideal machine learning project involves general flow analysis stages for building a Predicting Model. Steps followed to perform data analysis:
1. Understanding the problem domain
2. Data Exploration and Preparation
3. Feature Extraction
4. Dimensionality Reduction (or Feature Selection)
5. Various Model Evaluation and
6. Hyper-parameter Tuning
7. Ensembling: Model Selection
Kickstarter Is an American public-benefit corporation based in Brooklyn, New York, that maintains a global crowdfunding platform focused on creativity The company's stated mission is to "help bring creative projects to life". Kickstarter has reportedly received more than $1.9 billion in pledges from 9.4 million backers to fund 257,000 creative projects, such as films, music, stage shows, comics, journalism, video games, technology and food-related projects. People who back Kickstarter projects are offered tangible rewards or experiences in exchange for their pledges. This model traces its roots to subscription model of arts patronage, where artists would go directly to their audiences to fund their work.
Project Owner's Perspective:
Kickstarters Perspective: A large amount of manual effort is required to screen the project before it is approved to be hosted on the platform. Key ingredients for the project to be successfull.
List of possible predicting factors:
Total Projects: 378661 Total Features: 15
| ID | name | category | main_category | currency | deadline | goal | launched | pledged | state | backers | country | usd pledged | usd_pledged_real | usd_goal_real | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1000002330 | The Songs of Adelaide & Abullah | Poetry | Publishing | GBP | 2015-10-09 | 1000.0 | 2015-08-11 12:12:28 | 0.0 | failed | 0 | GB | 0.0 | 0.0 | 1533.95 |
| 1 | 1000003930 | Greeting From Earth: ZGAC Arts Capsule For ET | Narrative Film | Film & Video | USD | 2017-11-01 | 30000.0 | 2017-09-02 04:43:57 | 2421.0 | failed | 15 | US | 100.0 | 2421.0 | 30000.00 |
| 2 | 1000004038 | Where is Hank? | Narrative Film | Film & Video | USD | 2013-02-26 | 45000.0 | 2013-01-12 00:20:50 | 220.0 | failed | 3 | US | 220.0 | 220.0 | 45000.00 |
| 3 | 1000007540 | ToshiCapital Rekordz Needs Help to Complete Album | Music | Music | USD | 2012-04-16 | 5000.0 | 2012-03-17 03:24:11 | 1.0 | failed | 1 | US | 1.0 | 1.0 | 5000.00 |
| 4 | 1000011046 | Community Film Project: The Art of Neighborhoo... | Film & Video | Film & Video | USD | 2015-08-29 | 19500.0 | 2015-07-04 08:35:03 | 1283.0 | canceled | 14 | US | 1283.0 | 1283.0 | 19500.00 |
Columns which is not useful for analysis are as follows and can be removed: ID, goal, pledged, usd_pledged and currency
Cancelled State - There are 10% of projects in this dataset are in cancelled state. Based on business logic it could imply as failed as the project owner at some point while campaign is live, figured it will not work and cancelled the campaing. There are several other resons as well.
For Example, Project owner got funding from somewhere else or the project requirements changed which let him recreate online crowd funding campaign.
Since there is no clear reason given in this dataset for Project to get cancelled or no date on which it got cancelled. here, Canceled state should be considered as separate state and not failed.
Observations from the scatter plot presents:
Additionally, There is about 13 % of the projects which have not raised single penny and are either cancelled or failed.
Be noted: 1e8 is standard scientific notion, and here it indicates an overall scale factor for the y-axis. That is, if there's a 2 on the y-axis and a 1e8 at the top, the value at 2 actually indicates 21e8 = 2e8 = 2 10^8 = 200,000,000.
A general guideline for skewness is that if the number is greater than +1 or lower than –1, this is an indication of a substantially skewed distribution. For kurtosis, the general guideline is that if the number is greater than +1, the distribution is too peaked. Likewise, a kurtosis of less than –1 indicates a distribution that is too flat.
Interpreting Skewness: If skewness is less than −1 or greater than +1, the distribution is highly skewed. If skewness is between −1 and −½ or between +½ and +1, the distribution is moderately skewed. If skewness is between −½ and +½, the distribution is approximately symmetric.
state -0.271761 backers 86.294188 usd_pledged_real 82.063085 usd_goal_real 12.765938 dtype: float64
Numeric variables such as backers, usd_pledged_real, usd_goal_real are higly right skewed because of so many failed instances not having single backers or pledged amount raised. This will be addressed through data normalization while developing a model.
To explore these data it needs to be transformed and then histogram should be created to visualize distributions.
state usd_goal_real_log usd_pledged_real_log count 369678.000000 369678.000000 369678.000000 mean 1.257500 8.632460 5.775453 std 0.632728 1.671539 3.309677 min 0.000000 0.009950 0.000000 25% 1.000000 7.601402 3.526361 50% 1.000000 8.612685 6.456770 75% 2.000000 9.662097 8.314587 max 2.000000 14.591996 16.828050 Minimum goal amount is as small as 0.01 This is the format of your plot grid: [ (1,1) x1,y1 - ] [ (2,1) x2,y2 ] [ (2,2) x3,y3 ]